25 research outputs found

    MM Algorithms for Minimizing Nonsmoothly Penalized Objective Functions

    Full text link
    In this paper, we propose a general class of algorithms for optimizing an extensive variety of nonsmoothly penalized objective functions that satisfy certain regularity conditions. The proposed framework utilizes the majorization-minimization (MM) algorithm as its core optimization engine. The resulting algorithms rely on iterated soft-thresholding, implemented componentwise, allowing for fast, stable updating that avoids the need for any high-dimensional matrix inversion. We establish a local convergence theory for this class of algorithms under weaker assumptions than previously considered in the statistical literature. We also demonstrate the exceptional effectiveness of new acceleration methods, originally proposed for the EM algorithm, in this class of problems. Simulation results and a microarray data example are provided to demonstrate the algorithm's capabilities and versatility.Comment: A revised version of this paper has been published in the Electronic Journal of Statistic

    Online Updating of Statistical Inference in the Big Data Setting

    Full text link
    We present statistical methods for big data arising from online analytical processing, where large amounts of data arrive in streams and require fast analysis without storage/access to the historical data. In particular, we develop iterative estimating algorithms and statistical inferences for linear models and estimating equations that update as new data arrive. These algorithms are computationally efficient, minimally storage-intensive, and allow for possible rank deficiencies in the subset design matrices due to rare-event covariates. Within the linear model setting, the proposed online-updating framework leads to predictive residual tests that can be used to assess the goodness-of-fit of the hypothesized model. We also propose a new online-updating estimator under the estimating equation setting. Theoretical properties of the goodness-of-fit tests and proposed estimators are examined in detail. In simulation studies and real data applications, our estimator compares favorably with competing approaches under the estimating equation setting.Comment: Submitted to Technometric

    Bayesian Modeling and Inference for Nonignorably Missing Longitudinal Binary Response Data with Applications to HIV Prevention Trials

    Get PDF
    Missing data are frequently encountered in longitudinal clinical trials. To better monitor and understand the progress over time, one must handle the missing data appropriately and examine whether the missing data mechanism is ignorable or nonignorable. In this article, we develop a new probit model for longitudinal binary response data. It resolves a challenging issue for estimating the variance of the random effects, and substantially improves the convergence and mixing of the Gibbs sampling algorithm. We show that when improper uniform priors are specified for the regression coefficients of the joint multinomial model via a sequence of one-dimensional conditional distributions for the missing data indicators under nonignorable missingness, the joint posterior distribution is improper. A variation of Jeffreys prior is thus established as a remedy for the improper posterior distribution. In addition, an efficient Gibbs sampling algorithm is developed using a collapsing technique. Two model assessment criteria, the deviance information criterion (DIC) and the logarithm of the pseudomarginal likelihood (LPML), are used to guide the choices of prior specifications and to compare the models under different missing data mechanisms. We report on extensive simulations conducted to investigate the empirical performance of the proposed methods. The proposed methodology is further illustrated using data from an HIV prevention clinical trial. © Institute of Statistical Science. All rights reserved

    Online Updating of Survival Analysis

    No full text
    When large amounts of survival data arrive in streams, conventional estimation methods become computationally infeasible since they require access to all observations at each accumulation point. We develop online updating methods for carrying out survival analysis under the Cox proportional hazards model in an online-update framework. Our methods are also applicable with time-dependent covariates. Specifically, we propose online-updating estimators as well as their standard errors for both the regression coefficients and the baseline hazard function. Extensive simulation studies are conducted to investigate the empirical performance of the proposed estimators. A large colon cancer dataset from the Surveillance, Epidemiology, and End Results program and a large venture capital dataset with time-dependent covariates are analyzed to demonstrate the utility of the proposed methodologies. Supplemental files for this article are available online

    A new Bayesian joint model for longitudinal count data with many zeros, intermittent missingness, and dropout with applications to HIV prevention trials

    No full text
    In longitudinal clinical trials, it is common that subjects may permanently withdraw from the study (dropout), or return to the study after missing one or more visits (intermittent missingness). It is also routinely encountered in HIV prevention clinical trials that there is a large proportion of zeros in count response data. In this paper, a sequential multinomial model is adopted for dropout and subsequently a conditional model is constructed for intermittent missingness. The new model captures the complex structure of missingness and incorporates dropout and intermittent missingness simultaneously. The model also allows us to easily compute the predictive probabilities of different missing data patterns. A zero-inflated Poisson mixed-effects regression model is assumed for the longitudinal count response data. We also propose an approach to assess the overall treatment effects under the zero-inflated Poisson model. We further show that the joint posterior distribution is improper if uniform priors are specified for the regression coefficients under the proposed model. Variations of the g-prior, Jeffreys prior, and maximally dispersed normal prior are thus established as remedies for the improper posterior distribution. An efficient Gibbs sampling algorithm is developed using a hierarchical centering technique. A modified logarithm of the pseudomarginal likelihood and a concordance based area under the curve criterion are used to compare the models under different missing data mechanisms. We then conduct an extensive simulation study to investigate the empirical performance of the proposed methods and further illustrate the methods using real data from an HIV prevention clinical trial

    Exposure to secondhand smoke and asthma severity among children in Connecticut

    No full text
    <div><p>Objective</p><p>To determine whether secondhand smoke (SHS) exposure is associated with greater asthma severity in children with physician-diagnosed asthma living in CT, and to examine whether area of residence, race/ethnicity or poverty moderate the association.</p><p>Methods</p><p>A large childhood asthma database in CT (Easy Breathing) was linked by participant zip code to census data to classify participants by area of residence. Multinomial logistic regression models, adjusted for enrollment date, sex, age, race/ethnicity, area of residence, insurance type, family history of asthma, eczema, and exposure to dogs, cats, gas stove, rodents and cockroaches were used to examine the association between self-reported exposure to SHS and clinician-determined asthma severity (mild, moderate, and severe persistent vs. intermittent asthma).</p><p>Results</p><p>Of the 30,163 children with asthma enrolled in Easy Breathing, between 6 months and 18 years old, living in 161 different towns in CT, exposure to SHS was associated with greater asthma severity (adjusted relative risk ratio (aRRR): 1.07 [1.00, 1.15] and aRRR: 1.11 [1.02, 1.22] for mild and moderate persistent asthma, respectively). The odds of Black and Puerto Rican/Hispanic children with asthma being exposed to SHS were twice that of Caucasian children. Though the odds of SHS exposure for publicly insured children with asthma were three times greater than the odds for privately insured children (OR: 3.02 [2.84,3,21]), SHS exposure was associated with persistent asthma only among privately insured children (adjusted odds ratio (aOR): 1.23 [1.11,1.37]).</p><p>Conclusion</p><p>This is the first large-scale pragmatic study to demonstrate that children exposed to SHS in Connecticut have greater asthma severity, clinically determined using a systematic approach, and varies by insurance status.</p></div

    The 5 Connecticut socioeconomic status (SES) categories demonstrating the equal share percentage (ESP) for family income and poverty as of 2000.

    No full text
    <p>Town of residence (by participant’s zip code) was classified according to the 5 Connecticuts study as urban core, urban periphery, suburban, wealthy, or rural as proxies for area of residence. These proxies were determined by combining town-level population density, median family income, and percent of residents living in poverty (defined as the percentage of population below the 100% poverty threshold) The equal share line (where ESP = 0%) marks where the share of a variable does not differ from the statewide average. <i>Adapted with permission from The Five Connecticuts report Figure 7</i>.</p
    corecore